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Abstract. The fundamental law for protein folding is the Thermodynamic Principle: the amino acid 
sequence of a protein determines its native structure and the native structure has the minimum Gibbs 
free energy. If all chemical problems can be answered by quantum mechanics, there should be a quantum 
mechanics derivation of Gibbs free energy formula G(X) for every possible conformation X of the protein. 
We apply quantum statistics to derive such a formula. For simplicity, only monomeric self folding globular 
proteins are covered. 

We point out some immediate applications of the formula. We show that the formula explains the 
observed phenomena very well. It gives a unified explanation to both folding and denaturation; it explains 
why hydrophobic effect is the driving force of protein folding and clarifies the role played by hydrogen 
bonding; it explains the successes and deficients of various surface area models. The formula also gives a 
clear kinetic force of the folding: Fi(X) = — Vx^ G(X). This also gives a natural way to perform the ab 
initio prediction of protein structure, minimizing G(X) by Newton's fastest desciending method. 

1. Introduction 

The newly synthesized peptide chain of a protein automatically folds to its native structure and only in 
this native structure the protein can perform its biological function. Wrong structure will cause disasters 
jT] . Why and how the protein folds to its native structure and how to predict the native structure from only 
the knowledge of the peptide chain are topics of protein folding [2 . 

The fundamental law for protein folding is the Thermodynamic Principle: the amino acid sequence of 
a protein determines its native structure and the native structure of the protein has the minimum Gibbs free 
energy among all possible conformations j3] . Let X be a conformation of a protein, is there a natural Gibbs 
free energy function G(X)? The answer must be positive, as G. N. Lewis said in 1933: "There are can be no 
doubt but that in quantum mechanics one has the complete solution to the problems of chemistry." (quoted 
from [U page 130].) Protein folding is a problem in biochemistry, why we have not found such a formula 
G'(X)? The answer is also ready in hand. In 1929 Dirac wrote: "The underlying physical laws necessary for 
the mathematical theory of ... the whole of chemistry are thus completely known, and the difficulty is only 
that the exact application of these laws leads to equations much too complicated to be soluble." (quoted 
from [H page 132]). Yes, the complex of the Shrodinger equation for protein folding is beyond our ability 
to solve, no matter how fast and how powerful of our computers. But mathematical theory guarantees that 
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there are a complete set of eigenvalues (energy levels) and eigenfunctions to the Shrodinger equation in the 
Born-Oppenheimer approximation. Then consider that in the statistical mechanics, ensembles classify all 
(energy) states of the system, although we cannot have exact solutions to the Shrodinger equation, we can 
apply the grand canonical ensemble to obtain the desired Gibbs free energy formula G(X). This is the main 
idea of our derivation. The interested readers can read the details in the Appendix A. 

Here we first state the formulae and the assumptions in deriving them. Then we will point out some 
immediate applications and will use G'(X) to explain well known facts such as hydrophobic effect and its 
relations with the hydrogen bonding, the denaturation of proteins, and the success in discriminating native 
and closely nearby compact non-native structures by empirical surface area models. Other inferences from 
G(X), such as the kinetic force in protein folding, the common practice of measuring AG, etc., are also 
discussed. The derivation itself will be put in Appendix A so that uninterested readers can skip it. In 
Appendix B we will give the kinetic formulae Fi(X). 

1.1. Assumptions. All assumptions here are based on well-known facts of consensus. Let il be a protein 
with M atoms (ai, • • • , a^, • • • , slm)- A structure of il is a point X = (xi, • • • , x,;, • • • , xm) S E'^*'^, x.; S 
is the atomic center (nuclear) position of a^. Alternatively, the conformation X corresponds to a subset in 
M'^, Px — Uf£]^i3(xi, Ti) C M^, where is the van der Waals radius of the atom a.; and B{x,r) = {y G 
M'^; |y — x| < r} is a closed ball with radius r and center x. 

(1) The proteins discussed here are monomeric, single domain, self folding globular proteins. 

(2) Therefore, in the case of our selected proteins, the environment of the protein folding, the physio- 
logical environment, is pure water, there are no other elements in the environment, no chaperons, 
no co-factors, etc. This is a rational simplification, at least when one considers the environment as 
only the first hydration shell of a conformation, as in our derivation of the G(X). 

(3) During the folding, the environment does not change. 

(4) Anfinsen [3] showed that before folding, the polypeptide chain already has its main chain and each 
residue's covalent bonds correctly formed. Hence, our conformations should satisfy the following 
steric conditions set in [5] and [5]: there are > 0, 1 < i < j < M such that for any two atoms a^ 
and aj in Px = UjJliB(xfe, r^), 

eij < |xi — Xjl, no covalent bond between a^ and a^; 

dij — eij < |xj — Xjl < dij + eij, dtj is the standard bond length between a^ and a^. 

We will denote all conformations satisfying ([T]) as X and only consider X S X in this paper. 

(5) A water molecule is taking as a single particle, centered at w S M^, the oxygen nuclear position, and 
the covalent bonds in it are fixed. In the Born-Oppenheimer approximation, only the conformation 
X is fixed, all particles, water molecules or electrons in the first hydration shell of Px, are moving. 

(6) We agree that simply classifying amino acids as hydrophobic or hydrophilic is an oversimplifi- 
cation ff]. All atoms should be classified according to the hydrophobicity of moieties or atom 
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groups it belongs. Suppose there are H hydrophobic levels Hi, i = 1, - ■ ■ ,H, such that U^iHi — 
(ai,--- ,aj,--- ,aM)- For example, we may assume that the classification is as in [7], there are 
H = 5 classes, C, 0/N, O^, N+, S. Unlike in Hi, we also classify every hydrogen atom into one of 
the H classes according to to whom it is bounded with. There are many different hydrophobicity 
classifications. Our derivation is valid for any of them. 

1.2. The Formula. The formula has two versions, the chemical balance version is: 

H 

G(X) = MeA^e(X) + (2) 

where iVe(X) is the mean number of electrons in the space included by the first hydration shell of X, fig is its 
chemical potential. (X) is the mean number of water molecules in the first hydration layer that directly 
contact to the atoms in Hi, fii is the chemical potential. 

Let Mx (see Figure 1) be the molecular surface for the conformation X, defining Mxi C Mx as the set 
of points in Mx that are closer to atoms in Hi than any atoms in Hj, j ^ i. Then the geometric version of 
G(X) is: 

H 

G(X) = aHeV{nx) + ad^HeA{Mx) + ''tf^tMMxJ, a, v, > 0, (3) 

i=l 

where V{il.x.) is the volume of the domain Six enclosed by Mx, dw the diameter of a water molecule, and 
A(Mx) and ^(MxJ the areas of Mx and Mx., a[y(17x)+d^A(Mx)] = N^, v^A{M:s..) = N,iX), l<i<H. 
The a and I'i are independent of X, they are the average numbers of particles per unit volume and area. 

2. Applications 

2.1. Structure Prediction. Prediction of protein structures is the most important method to reveal pro- 
teins' functions and working mechanics, it becomes a bottle neck in the rapidly developing life science. With 
more and more powerful computers, this problem is attacked in full front. Various models are used to achieve 
the goal, homologous or ab initio, full atom model or coarse grained, with numerous parameters of which 
many are quite arbitrary. But although our computer power growths exponentialy, prediction power does 
not follow that way. At this moment, we should take a deep breath and remind what the great physicist 
Fermi said: "There are two ways of doing calculations in theoretical physics. One way, and this is the way I 
prefer, is to have a clear physical picture of the process that you are calculating. The other way is to have a 
precise and selfconsistent mathematical formalism." And "I remember my friend Johnny von Neumann used 
to say, with four parameters I can fit an elephant, and with five I can make him wiggle his trunk." Quoted 
from 0. 

These should also apply to any scientific calculation, not just theoretical physics. Look at the current 
situation, all ab initio prediction models are actually just empirical with many parameters to ensure some 
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success. Fermi's comments remind us that a theory should be based on fundamental physical laws, and 
contain no arbitrary parameters. Look at formulae ([2| and ([3|, we see immediately that they are neat, 
precise and self consistent mathematical formulae. Furthermore, they including no arbitrary parameter, all 
terms in them have clear physical meanings. Chemical potentials /ie and /ii's, geometric constants a and 
Vi^s, can be valued by theory or experiments, they are not arbitrary at all. 

But a theory has to be developed, tested, until justified or falsified. For interested researchers, the tasks 
are to determine the correct values of the chemical potentials in ([2| and the geometric ratios a and in ([s]) . 
There are many estimates to them, but they are either for the solvent accessible surface area such as in [7] 
hence not suit to the experiment data as pointed out in [9|, or do not distinguish different hydrophobicity 
levels as in _9J. To get the correct values of the chemical potentials and geometric constants, commonly 
used method of training with data can be employed, in which we can also test the formulae's ability of 
discriminating native and nearby compact non-native structures. After that, a direct test is to predict the 
native structure from the amino acid sequence of a protein by minimizing the following: 

G(Xa.) = inf G(X). (4) 

This is the first time that we have a theoretically derived formula of the Gibbs free energy. Before this, all ab 
intitio predictions are not really ab initio. A combined (theoretical and experimental) search for the values 
of chemical potentials will be the key for the success of the ab initio prediction of protein structure. 

2.2. Energy Surface or Landscape. An obvious application is the construction of Gibbs free energy 
surface or landscape. We do not need any empirical estimate anymore, the Gibbs free energy formula 
G : X — >■ M gives a graph (X, G(X)) over the space X (all eligible conformations for a given protein), and 
this is nothing but the Gibbs free energy surface. Mathematically it is a 3M dimensional hyper-surface. Its 
characteristics concerned by students of energy surface theory, such as how rugged it is, how many local 
minimals are there, is there a funnel, etc., can be answered by simple calculations of the formula. 

Since the function G is actually defined on the whole R^*^ (on an domain of R'^*^ containing all X is 
enough), we can explore mathematical tools to study its graph, and compare the results with the restricted 
conformations. One important question is: Does the absolute minimum structure belongs to X? 

2.3. Kinetics. It is observed that while we apply the thermodynamic principle, a difficulty is that we do 
not have kinetics and have use other method to present it [TU]. The advantage of a theoretical formula for 
Gibbs free energy in the form G(X) is that it connects the thermodynamics with the kinetics. In fact, for 
any atomic position x^, the kinetic force is Fj(X) = — G(X), [TT]. With formula ([S]) these quantities are 
really calculable, mathematical formulae and implementations on molecular surface Mx are given in |12) . 
We will give the mathematical formulae in Appendix B. The resulting Newton's fastest descending method 
was used in the simulation in [6]. 
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3. Discussions 

We are theoretically treating the protein folding by introducing quantum statistics. A theory is useful 
only if it can make explanations to the observed facts and if it can simplify and improve research methods 
as well as clarify concepts. We will show that G(X) can do exactly these. 



Accessible Surface 




Molecular 
Surface 



Figure 1 . Two dimensional presenting of molecular surface [13] and solvent accessible surface [T3] . This 
figure was originally in [6]. 
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Figure 2. Note that TZxi generally are not connected, i.e., having more than one block. 

If the same theoretical result can be derived from two different disciplines, it is often not just by chance. 
We will show an early phenomenological mathematical model [2 , starting from purely geometric reasoning, 
has achieved formula (|3]), with just two hydrophobic levels, hydrophobic and hydrophilic. 

A theory also has to be falciable, that is making a prediction to be checked. The fundamental prediction 
is that minimizing formulae ([2]) or (|3| we will get the native structures from the amino acid sequences of 
proteins covered in the assumptions of the formulae. That can only be done after we have the actual values 
in the physiological environment of the chemical potentials appear in the formulae. 

3.1. Unified Explanation of Folding and Denaturation. Protein denaturation is easy to happen, enve 
if the environment is slightly changed, as described in [13] by Hsien Wu in 1931. (The reference [TS] is the 
13th article that theorizes the results of a series experiments, and a preliminary report was read before the 
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Xlllth International Congress of Physiology at Boston, August 19-24, 1929, and published in the Am. J. 
Physiol for October 1929. In which Hsien Wu first suggested that the denatured protein is still the same 
molecule, only structure has been changed.) Anfinsen in various experiments showed that after denaturation 
by changed environment, if removing the denature agent, certain globular proteins can spontaneously refold 
to its native structure, |3] . The spontaneous renaturation suggests that protein folding does not need outside 
help, at least to the class of proteins in study. Therefore, the fundamental law of thermodynamics asserts 
that in the environment such that a protein can fold, the native structure must have the minimum Gibbs free 
energy. The same is true for denaturation, under the denatured environment, the native structure no longer 
has the minimum Gibbs free energy, some other structure (s), will have the minimum Gibbs free energy. Thus 
let En present environment, any formula of Gibbs free energy should be stated as G'(X, En) instead of just 
G'(X), unless specified the environment like in this paper. Let En^ be the physiological environment and 
Enu be some denatured environment, X^r be the native structure and Xjy be one of the denatured stable 
structure in Enu, then the thermodynamic principle for both of protein folding and unfolding should be 
that 

G{Xn, EnM) < G{Xu, Em,). G(Xw, Enu) > G{Xu,Enu). (5) 

To check this, an experiment should be designed that can suddenly put proteins in a different environment. 
Formulae ^ and ([3| should be written as G'(X, En^). Indeed, the chemical potentials /^e and /i^'s are Gibbs 
free energies per corresponding particles, fi = u + Pv — Ts. Two environment parameters, temperature T 
and pressure P, explicitly appear in /i, the inner energy u and entropy s may also implicitly depend on the 
environment. According to formulae ^ and (|3]), if /x^ < 0, then make more Hi atoms to expose to water 
(make larger A(Mxj)) '^iU reduce the Gibbs free energy. If > 0, then the reverse will happen. Increase 
or reduce the Hi atoms' exposure to water {A{AIyii)), the conformation has to change. The conformation 
changes to adjust until we get a conformation X^r, such that the net effect of any change of it will either 
increase some Hi atoms' exposure to water while /ij > or reduce Hi atoms' exposure to water while /i^ < 0. 
In other words, the G(X.. Eun) achieves its minimum at G{Xn, En^). Protein folding, at least for the 
proteins considered in the assumptions, is explained very well by formulae ([2| and ([s]). 

In changed environment, the chemical potentials /ig and /i^'s in formulae ^ and (|3| changed their values. 
With the changed chemical potentials, G(X., Euu) has the same form as G{X, En^) but different chemical 
potentials. Therefore, the structure X[/ will be stable, according to the second inequality in ([5|, the process 
is exactly the same as described for the protein folding if the changing environment method does not include 
introducing new kinds (non-water) of particles, for example, if we only change temperature or pressure. 

Even the new environment including new kinds of particles, formulae ^ and Q can still partially explain 
the denaturation, only that more obstructs prevent the protein to denature to Xy, but any way it will end 
in some structure other than the Xjv, the protein is denatured. Actually, this is a hint of how to modify the 
current formulae to extend to general proteins. 
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3.2. Why G'(X) Instead Of AG(X). Here is a chance to explain why we use G'(X) instead of AG'(X). 
In various experiments of testing the Gibbs free energy difference (AG) between Xjv and X[/, the common 
practice is essentially set 



see, for example, jl6j . Though some interpolation was taken to adjust, but that is not the experiment 
observation. But formulae ^ and (|3| suggest that what we need is 



Unfortunately, there is no method of denaturation without changing environment, at least currently no such 
method. Therefore, no way to experimentally measuring of AG in ([7|. We should reexamine the conclusion 
of AG is very small because it was essentially drawn from (|6|. Thus although we believe that it is true, the 
conclusion was achieved neither via theory nor via real experiment observation. 

While experiment has no way to change the native structure without disturbing the environment, theory 
can play a role instead. Formulae ^ and ([3| give us the chance to compare AG, as long as we have accurate 
chemical potentials. 

3.3. Explain Hydrophobic EfTect and the Role Played by Hydrogen Bonding. In 1959, by re- 
viewing the literature Kauzmann concluded that the hydrophobic effect is the main driving force in protein 
folding [17 . Empirical correlation between hydrophobic free energy and aqueous cavity surface area was 
noted as early as 1974 [TH], giving justification of the hydrophobic effect. Various justifications of hydropho- 
bic effect were published, based on empirical models of protein folding, for example, TP . But the debate 
continues to present, some still insist that it is the hydrogen bond instead of hydrophobic effect plays the 
main role of driving force in protein folding, for example, |20j . The theoretically derived formulae ^ and 
([3]) can explain why the hydrophobic effect is indeed the driving force. A simulation of reducing hydrophobic 
area alone ([5]) can explain the intra- molecular hydrogen bonds. 

In fact, according to formulae ([2| and (|3|, if /i^ < 0, then make more Hi atoms to appear in the boundary 
of Px will reduce the Gibbs free energy. If fii > 0, then the reverse will happen, reducing the exposure of Hi 
atoms to water will reduce the Gibbs free energy. This gives a theoretical explanation of the hydrophobic 
effect. The kinetic formulae Fj = VxiG(X) and those given in Appendix B are the force that push the 
conformation to change to the native structure. 

The mechanics stated above works through the chemical potentials fii for various levels of hydrophobicity, 
in physiological environment, all hydrophobic Hi's will have positive /Ji, all hydrophilic i/^'s will have negative 
fii- Thus changing conformation Px such that the most hydrophilic Hi {fii = min(/xi, • • • gets the first 

priority to appear on the boundary, and the most hydrophobic Hi (fii — max(/ii, • • • ,fiH)) gets the first 
priority to hide in the hydrophobic core to avoid contacting with water molecules, etc. We should keep in 
mind that all the time, the steric conditions (IT]) have to be obeyed. 



A G = G{Xu, Enu) - G(Xw,^;nAr). 



(6) 



A G = G(X[/, £^njv) - G{l^N,EnN). 



(7) 
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But the hydrophobic effect is actually partially working through hydrogen bond formation. This is well 
presented in the chemical potentials in (|2| and ([s]). In fact, the values of the chemical potentials reflect the 
ability of the atoms or atom groups to form hydrogen bond, either with another atom group in the protein or 
with water molecules. This gives a way to theoretically or experimentally determine the values of hydrophilic 
chemical potentials: checking the actually energy of the hydrogen bond. 

For hydrophobic ones, it will be more complicated, common sense is that it reduces the entropy that 
certainly comes from the inability of forming hydrogen bonds with water molecules. Hence although hy- 
drophobic effect is the driving force of protein folding, it works through the atom's ability or inability to 
form hydrogen bonds with water molecules. 

How to explain the intra-molecular hydrogen bonds? It seems that formula ([2| and ([s]) do not address 
this issue. The possible theory is that the amino acid sequence of a protein is highly selectable in evolution, 
in tact only a tiny number of amino acid sequences can really become a protein. With these specially 
selected sequences, while shrinking the various hydrophobic surfaces to form a hydrophobic core, residues 
are put in position to form secondary structures and their associated hydrogen bonds. This sounds a little 
bit too arbitrary. But a simulation of shrinking hydrophobic surface area alone indeed produced secondary 
structures and hydrogen bonds. The simulation was reported in [5]. Without calculating any dihedral 
angles or electronic charges, without any arbitrary parameter, paying no attention to any particular atom's 
position, by just reducing hydrophobic surface area (there it was assumed that there are only two kinds 
of atoms, hydrophobic and hydrophilic), secondary structures and hydrogen bonds duly appeared. The 
proteins used in the simulation are 2i9c, 2hng, and 2ib0, with 123, 127, and 162 residues. No simulation 
of any kind of empirical or theoretical models had achieved such a success. More than anything, this 
simulation should prove that hydrophobic effect alone will give more chance of forming intra-molecular 
hydrogen bonds. Indeed, pushing hydrophilic atoms to make hydrogen bonds with water molecules will give 
other non-boundary hydrophilic groups more chance to form intra-molecular hydrogen bonds. 

Again formula ([S]) can partly explain the success of this simulation, when there are only two hydrophobic 
classes in ([s]), the hydrophobic area presents the main positive part of the Gibbs free energy, reducing it is 
reducing the Gibbs free energy, no matter what is the chemical potential's real value. 

3.4. Explanation of the Successes of Surface Area Models. In 1995, Wang et al [21] compared 8 
empirical energy models by testing their ability to distinguish native structures and their close neighboring 
compact non-native structures. Their models WZS are accessible surface area models with 14 classes of 
atoms, 'YliiLi'^i'^i- Each two combination of three targeting proteins were used to train WZS to get (7^, 
hence there are three models WZSl, WZS2, and WZS3. Among the 8 models, all WZS's performed the 
best, distinguishing all 6 targeting proteins. The worst performer is the force field AMBER 4.0, it failed in 
distinguishing any of the 6 targets. 
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These testing and the successes of various surface area models such as [7] , showed that instead of watching 
numerous pairwise atomic interactions, the surface area models, though looking too simple, have surprising 
powers. Now the formula Q gives them a theoretic justification. On the other hand, the successes of these 
models also reenforce the theoretical results. 

There is a gap between the accessible surface area model in |7j and the experiment results (surface tension) , 
as pointed out in [9] . The gap disappeared when one uses the molecular surface area to replace the accessible 
surface area, in ^ it was shown that molecular surface area assigned of 72-73 cal/mol/A^ perfectly fits with 
the macroscopic experiment data. Later it was asserted that the molecular surface is the real boundary of 
protein in its native structure 22J. 

Figure 1 and Figure 2 show the water molecules contact to Px and the accessible surface and molecular 
surface, we see that all water molecules must be outside the molecular surface Mx, but the assessable 
surface is in the middle of the first hydration shell. So it is better to use the molecular surface Mx as the 
boundary of the conformation Px- Moreover, the conversion of the mean numbers -/Vi(X) to surface area, 
NiCK) — i'iA{Mx a), only works for the molecular surface, not for the accessible surface. This can explain 
the conclusions in |5] and [55]. 

In fact, the advantage of the solvent accessible surface is that by definition of it we know exactly each 
atom occupies which part of the surface, therefore, one can calculate its share in surface area. This fact 
may partly account why there are so many models based on the solvent accessible surface, even people knew 
the afore mentioned gap. For other surfaces, we have to define the part of surface that belongs to a specific 
hydrophobicity class. This was resolved in ,5^ via the distance function definition as we used here. 

All surface area models neglected one element, the volume of the structure. As early as in the 1970's, 
Richards and his colleagues already pointed out that the native structure of globular proteins is very dense, 
or compact, (density = 0.75, [13)). To make a conformation denser, obviously we should shrink the volume 
V{Qx.)- The model in [S] introduced volume term but kept the oversimplification of all atoms are either 
hydrophobic or hydrophilic. The derivation of ^ and ^ shows that volume term should be counted, but 
it may be that afi^ is very small, in that case, volume maybe really is irrelevant. 

3.5. Coincidence with Phenomenological Mathematical Model. If a theoretical result can be derived 
from two different disciplines, its possibility of correctness will be dramatically increased. Indeed, from a pure 
geometric consideration, a phenomenological mathematical model, G(X) = aV{nyi) + 5yl(Mx) + cv4(Mx i), 
a, 6, c > (it was assumed that there are only two hydrophobicity levels, hydrophobic and hydrophilic, the 
hydrophilic surface area A(Mx2) is absorbed in A{Mx) by A{Mx_2) — ^(^x) — ^(Afx i)), was created in [S]. 
It was based on the well-known global geometric characteristics of the native structure of globular proteins: 
1. high density; 2. smaller surface area; 3. hydrophobic core, as demonstrated and summarized in [13| and 
[23]. So that to obtain the native structure, we should shrink the volume (increasing the density) and surface 
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area, and form better hydrophobic core (reducing the hydrophobic surface area A{Mxi)) simultaneously 
and cohensively. 

The coincidence of formula ([3| and the phenomenological mathematical model of L5.^ cannot be just a 
coincidence. Most likely, it is the same natural law reflected in different disciplines. The advantage of ([s]) is 
that everything there has its physical meaning. 

3.6. Potential Energy Plays No Role in Protein Folding. Formulae ^ and ([s]) theoretically show that 
hydrophobic effect is the driving force of protein folding, it is not just solvent free energy besides the pairwise 
interactions such as the Coulombs, etc., as all force fields assumed. Only in the physiological environment 
the hydrophobic effect works towards to native structure, otherwise it will push denaturation as discussed 
in explanation of folding and unfolding. Formulae ^ and ^ show that the Gibbs free energy is actually 
independent of the potential energy, against one's intuition and a bit of surprising. The explanation is that 
during the folding process, all covalent bonds in the main chain and each side chain are kept invariant, the 
potential energy has already played its role in the synthesis process of forming the peptide chain, which of 
course can also be described by quantum mechanics. According to Anfinsen j3], protein folding is after the 
synthesis of the whole peptide chain, so we can skip the synthesis process and concentrate on the folding 
process. 

The steric conditions ([T]) will just keep this early synthesis result, not any X = (xi, • • • ,Xi, • • • ,xm) is 
eligible to be a conformation, it has to satisfy ([I]). The steric conditions not only pay respect to the bond 
length, it also reflect a lot of physi-chemco properties of a conformation: They are defined via the allowed 
minimal atomic distances, such that for non-bonding atoms, the allowed minimal distances are: shorter 
between differently charged or polarized atoms; a little longer between non-polar ones; and much longer 
(generally greater than the sum of their radii) between the same charged ones, etc. For example, we allow 
minimal distance between sulfur atoms in Cysteines to form disulfide bonds. And for any new found intra- 
covalent bond between side chains, we can easily modify the steric conditions to allow it to form during 
folding, though it may not necessarily form. 

Especially in the minimization of G(X), steric conditions must be kept, thus the minimization in Q is a 
constrained minimization. This, unfortunately, is a draw back, it increased the mathematical difficulty. 

4. Conclusion 

A quantum statistical theory of protein folding for monomeric, single domain, self folding globular proteins 
is suggested. The assumptions of the theory fit all observed realities of protein folding. The resulting formulae 
([2]) and ([3]) do not have any arbitrary parameters and all terms in them have clear physical meaning. Potential 
energies involving pairwise interactions between atoms do not appear in them. 

Formulae ^ and ^ have explanation powers. They give unified explanation to folding and denaturation, 
to the hydrophobic effect in protein folding and its relation with the hydrogen bonding. The formulae 
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also explain the relative successes of surface area protein folding models. Relation between kinetic and 
thermodynamic of protein folding is discussed, driving force formula comes from the Gibbs free energy 
formula ^ are also given. Energy surface theory will be much easier to handle. The concept of AG is 
clarified. 

Appendix A. The Derivation 

Let d^, be the diameter of a water molecule and Afx be the molecular surface of Px as defined in [T3] 
with the probe radius see Figure 1. Define 

Tex = {x e : clist(x, M^) < d^,} \ Px (8) 

as the first hydration shell surrounding Px, where dist(x, S) = infyg^ |x — y|. Then 7x = -Px U TZx will be 
our thermodynamic system of protein folding at the conformation X. 

We classify the atoms in il into H hydrophobicity classes Hi, i = I,-- - ,H, such that ufl^^i — 
{ai,a2,--- jajvf}- Let C {1,2,- •• ,M} be the subset such that a^- e Hi if and only if j e li- Define 
^Xi = ^jeiiB{:x.;j,rj) C Px and as shown in Figure 2, 

7ex« -{xeTex : dist(x,Px,) <dist(x,Px\Px^)}, 1 < * < ^ , (9) 
Let V(p.) be the volume of C M^, then 

H 

7ex = U^li7ex., F(7^x) = 5I^(^x^), andfori^j, V{n^,fMl^,)=Q. (10) 

1=1 

Since Mx is a closed surface, it divides M.^ into two regions ilx and fi^ such that (Jfix = dVL'^ — Afx 
and = J7x U Mx U Vl'y^. We have Px C fix and all nuclear centers of atoms in the water molecules in T^x 
are contained in Q!y^. Moreover, 17x is bounded, therefore, has a volume V{Q,x)- Define the hydrophobicity 
subsurface Mxi, I < i < H, as 

Mx2 = Mxn^. (11) 
Let A{S) be the area of a surface S* C M'^, then 

H 

Mx = ^tlMx^, A{Mx) = MM^i), and if i ^ j, then A(Mx. n Mxj) = 0. (12) 

i=l 

Although the shape of each atom in il is well defined by the theory of atoms in molecules ([1] and [M]), 
what concerning us here is the overall shape of the structure Px- The cutoff of electron density p > O.OOlau 
(|1] and [M]), gives the overall shape of a molecular structure that is just like Px, a bunch of overlapping 
balls. Moreover, the boundary of the p > O.OOlau cutoff is much similar to the molecular surface A/x 
which was defined by Richards in 1977 |13j and was shown has more physical meaning as the boundary 
surface of the conformation Px [9] and [22] . 
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A.l. The Shrodinger Equation. For any conformation X e X, let W = (wi, • • • , w^, • • • , Wtv) e M.'^^ be 
the nuclear centers of water molecules in Tlx and E = (ei, • • • , e^, • • • , e^) e M"^^ be electronic positions of 
all electrons in 7x- Then the Hamiltonian for the system 7x is 

*f fc2 i,2 ^ ft2 ^ 

E|;-V?-5|;Ev|-5^i:v? + V-(X,W,E). (13) 

i—1 i — 1 i—1 

where rrii is the nuclear mass of atom in il, to^, and rrie are the masses of water molecule and electron; 
Vl is Laplacian in corresponding R^; and V the potential. 



A. 2. The First Step of The Born-Oppenheimer Approximation. Depending on the shape of Px, for 
each i, 1 < i < H , the maximum numbers Nx.i of water molecules contained in 7?.xi vary. Theoretically 
we consider all cases, i.e., there are < iV^ < iVxi water molecules in TZxi, ^ < i < H. Let Mq = 
and M, = EJ<^NJ and W, = (wm._i+i,--- ,wm._i+j,--- ,wmJ G M^^-, 1 < i < i/, and W = 
(Wi, W2, • • • , Wjv/if) G M'^^^^f denote the nuclear positions of water molecules in T^-x- As well, there will be 
all possible numbers < A^e < 00 of electrons in 7x- Let E = (ei, 62, • • • , batJ e M.^^' denote their nuclear 
positions. For each fixed X e X and N = {Ni, • • • , Nh, N^), the Born-Oppenheimer approximation has the 
Hamiltonian 



X 



l^(X,W,E) 



The eigenfunctions (W, E) e ^odl^Li^x^ x '^x^) = 

basis of 'Hx,N- Denote theire eigenvalues (energy levels) as E^^ ^, 



jv, 1 < * < 00, comprise an orthonormal 



then Hxtpf''^ = E- 



X,Af 



A. 3. Grand Partition Function and Grand Canonic Density Operator. In the following we will 
use the natotions and definitions in Chapter 10]. Let ks be the Bolzmman constant, set /3 = I/UbT. 
Since the numbers Ni and N,, vary, we should adopt the grand canonic ensemble. Let be the chemical 
potentials, that is, the Gibbs free energy per water molecule in TZx.i- Let fie be electron chemical potential. 
The grand canonic density operator is ([IS] and [TT]) 

H 

px = exp <; -/3 Hx~Yl ^'^^ " ^^<^^<' " ^(^) 



where the grand partition function is 

exp[-/3rj(X)] = Trace ■ 



I exp -13 (^Hx - J2 - -"eiVe^ 



i,N 



14 



YI FANG 



A. 4. The Gibbs Free Energy G'(X). According to pSl page 273], under the grand canonic ensemble the 
entropy 5(X) = S'(7x) of the system 7x is 



H 



5(X) 



~fcBTrace(px Inpx) = -fcsi Inpx )^kBP{Hx~ n{X) - ^ n,N, - fi,N, 



H 



H 



/T 



/T. 



(14) 



Here we denote {Ni) ~ NiiX) the mean numbers of water molecules in 7?.xi, ^ < i < H , and (TVg) = -^*e(X) 
the mean number of electrons in 7x- The inner energy {Hx} of the system 7x is denoted as U{X) = J7(7x)- 
The term r2(X) is a state function with variables T, F, ^i, • • • , ^/j, and fie, and is called the grand canonic 
potential ([25l page 27]) or the thermodynamic potential ([HI page 33]). By the general thermodynamic 
equations [TTl pages 5 and 6]: 

H 

dn{x) = -SdT - Pdv -Y,N^d^i^- Nedfie, xn{x) = n{x){T,xv,fii,--- ,m,f^e), 

1=1 

we see that il(X)(T, V,^i,--- , /i^, A^e) = —PV{X), where V{X) = V^(7x) is the volume of the thermody- 
namic system 7x- Thus by (14) we obtain the Gibbs free energy G(X) = G(rx) in (H): 

H 

G(X) - G(rx) - PF(X) + ;7(X) - r5(X) = ^ fi^N,{X) + MeA^e(X). 

i=l 

A. 5. Converting Formula ([2]) to Geometric Form ([s]). Since every water molecule in TZ^i has contact 
with the surface A/xi, A'^i(X) is proportional to the area yl(A/xi)- Therefore, there are i^i > 0, such that 



u,A{Mx,) ^ N,{X), l<i<H. 

Similarly, there will be an a > such that aF(7x) = A''e(X). 

By the definition of 7x and fix, we have roughly V^(7i\rix) = dwA{Myi). Thus 

7Ve(X) = al/(rx) = a[^^(f^x) + V{Tx\nx)] = aV{nx) + ad^A(Afx). 



(15) 



(16) 



Substitute ^ and ^ into ([2]), we get (|3]). 

We are applying fundamental physical laws directly to protein folding. The question is, can we do so? 
We will try to check how rigorous is the derivation and ask that are there any fundamental errors? We will 
also discuss possible ways to modify the formula or the derivation. 
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A. 6. How Rigorous Is The Derivation? We adopted two common tools in physics, the first step of 
the Born-Oppcnhcimer approximation in quantum mechanics and the grand canonic ensemble in statistical 
physics to obtain formula 

A. 6.1. The Born-Oppenheimer Approximation. The Born-Oppenheimer approximation "treats the electrons 
as if they are moving in the field of fixed nuclei. This is a good approximation because, loosely speaking, 
electrons move much faster than nuclei and will almost instantly adjust themselves to a change in nuclear 
position." [23]. Since the mass of a water molecule is much less than the mass of a protein, we can extend 
this approximation to the case of when X changes the other articles, electrons and water molecules, will 
quickly adjust themselves to the change as well. 

A. 6. 2. The Statistic Physics in General and the Grand Ganonic Ensemble in Particular. "Up to now there 
is no evidence to show that statistical physics itself is responsible for any mistakes," [TT] Preface]. Via the 
ensemble theory of statistical mechanics we consider only one protein molecule and particles in its immediate 
environment, it is justified since as pointed out in [111 page 10] "When the duration of measurement is short, 
or the number of particles is not large enough, the concept of ensemble theory is still valid." And among 
different ensembles, "Generally speaking, the grand canonic ensemble, with the least restrictions, is the most 
convenient in the mathematical treatment." page 16]. In fact, we have tried the canonic ensemble and 
ended with a result that we have to really calculate the eigenvalues of the quantum mechanics system. 

Our derivations only put together the two very common and sound practices: the Born-Oppenheimer 
approximation (only the first step) and the grand canonic ensemble, and apply them to the protein folding 
problem. As long as protein folding obeys the fundamental physical laws, there should not be any serious 
error with the derivation. 

A. 7. Equilibrium and Quasi-Equilibrium. A protein's structure will never be in equilibrium, in fact, 
even the native structure is only a snapshot of the constant vibration state of the structure. The best 
description of conformation X is given in l4l, Chapter 3], we can simply think that a conformation X 
acturally is any point Y contained in a union of tiny balls centered at x^, i = 1, • • • , M. In this sense, we can 
only anticipate a quasi-equilibrium description (such as the heat engine, page 94]) of the thermodynamic 
states of the protein folding. This has been built-in in the Thermodynamical Principle of Protein Folding. 
So the quantities such as 5'(X), ri(X), and G'(X) can only be understood in this sense. That is, observing 
a concrete folding process one will see a series conformations X^, i — 1,2,3, The Thermodynamic 
Principle then says that if we measure the Gibbes free energy G'(Xi) then eventually G(Xi) will converge 
to a minimum value and the X^ will eventually approach to the native structure. While all the time, no 
conformation X^ and thermodynamic system Tx.. are really in equilibrium state. 
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Appendix B. Kinetics Formulae 

Let Xi = {xi,yi,Zi), we can write F = VxiG(X) — {Gxi,Gy-,Gzi)(X.)- The calculation of G'a;;(X), for 
example, is via Lie vector field induced by moving the atomic position x^. In fact, any infinitesimal change of 
structure X will induce a Lie vector field L : X — )■ M^. For example, moving x^ from x^ to Xi + {Axi, 0, 0) while 
keep other nuclear center fixed will induce L^. : X — ^ M'^, such that L^. (x^) = (1, 0, 0) and L^. (x^) = (0, 0, 0) 
for j ^ i. Similarly we can describe Ly. and Lz^- Then write Gj;. = , etc. and 

Vx.G(X) = (G£^ ,Gr ,G£^,)(X), (17) 



Rotating around a covalent bond bij also induce a Lie vector field Li,-j : X 
covalent bond bij , then the bond axis is 



i'^. In fact if aiSLj form the 



If bjj is rotatable, i.e., 1. it is chemically allowed to rotate; 2. cutting off by from the molecular graph of X 
(see, for example, [H page 32]) with two components, denoted all nuclear centers in one component by Rb^- 
and others in Fh.^ . We can rotate all centers in Rh.. around for certain angle while keep all centers in 
Fi,. . fixed. The induced Lie vector field Li,. . will be 



Lb,, (x/c) = (xfc - Xi) A h,j, if Xfe e Rb^. ; 
ifc.j(x/c) = 0, if Xfc e Fb^,. 

Any such a Lie vector field L will generate a family of conformations X^ — (xit, • • 
Xfci = Xfe + tL{yLk), k = !,■■■ ,M. 
The derivative G£(X) is given by 

H 

G£(X) = a/ieV^£(l^x) +ad^Me^f(A^x) + X1''^/^''^l(^Xz), 



5 '^i t : 



(18) 
(19) 

,XMt), where 



(20) 



where 



i^(L•A)d■H^ 



(21) 



where N is the outer unit normal of Mx, H the mean curvature of Afx, and Ti,^ the Hausdorff measure. 



Define ft^ 



as ft i{^) ~ dist(x, Afxt i) — dist(x, Mx^ \-^^Xt j)i ^^^d denote 



9/t 



at 



d/o,» 
dt 



then let r/be the unit outward conormal vector of dM-x_i (normal to dMxi but tangent to Afx) 



^£(Mx.) = -2 



A'/x 



9A/x 



Luff — 



dfo., 
(it 



Va/x /o,i 



dH^ 



(22) 



(23) 
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The is all the information we need in calculating the molecular surface Mx^ [17]. But the kinetic 
formula G^pC) can help us quickly achieve a new conformation Y from X without really calculating Xt. 
For example, we can list all rotatable covalent bonds of the protein as (bi, • • • , b^, • • • , b^) and then simul- 
taneously rotate them to get new conformations very quickly by moving along the negative of the gradient 

(Gl.^,--- ,G^^ ,--- ,G^J{X). (24) 

To calculate the above formulae we actually have to translate them into formulae on the molecular surface 
Mx- These translations are given in [H], they are calculable (all integrals are integrable, i.e., can be 
expressed by analytic formulae with variables X) and were calculated piecewisely on A/x- If the rotation 
around b^ with rotating angle — sG^^ (X) be denoted as R^, we can then get the new conformation — 
Ri o Rl_i o . . . o Ri(X), where s > is a suitable step length. The order of rotations in fact is irrelevant, 
i.e., by any order we will always get the same conformation Y^, as proved in [H]. 

This actually is the Newton's fastest desciending method, it reduces the Gibbs free energy G(X) most 
efficiently. Afore mentioned simulations in [6 used this method. 
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